In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development

The Mediterranean fruit fly (medfly), Ceratitis capitata, is an important model organism in biology and agricultural research with high economic relevance. However, information about its embryonic development is still sparse. We share nine long-term live imaging datasets acquired with light sheet fluorescence microscopy (484.5 h total recording time, 373 995 images, 256 Gb) with the scientific community. Six datasets show the embryonic development in toto for about 60 hours at 30 minutes intervals along four directions in three spatial dimensions, covering approximately 97% of the entire embryonic development period. Three datasets focus on germ cell formation and head involution. All imaged embryos hatched morphologically intact. Based on these data, we suggest a two-level staging system that functions as a morphogenetic framework for upcoming studies on medfly. Our data supports research on wild-type or aberrant morphogenesis, quantitative analyses, comparative approaches to insect development as well as studies related to pest control. Further, they can be used to test advanced image processing approaches or to train machine learning algorithms and/or neuronal networks.


Background & Summary
In insect developmental biology, comparative approaches shed light on the broad variety of developmental strategies and contribute to our understanding of the evolution of development 1 . To study embryonic morphogenesis on the cellular and subcellular level, light sheet fluorescence microscopy (LSFM) became the method of choice. It allows non-invasive live imaging of millimeter-sized specimens for time periods up to several days 2-8 and has already been successfully applied to characterize the embryonic morphogenesis of several insect species such as the fruit fly Drosophila melanogaster [9][10][11][12][13][14][15] , the scuttle fly Megaselia abdita 16 and the red flour beetle Tribolium castaneum [17][18][19][20] . Due to the intrinsic properties of LSFM 21 , e.g., a high signal-to-noise ratio and good depth penetration in conjunction with nearly no photobleaching and phototoxicity, the acquired datasets typically provide a profound collection of high-quality images with excellent temporal and good three-dimensional spatial resolutions. The quality and quantity of the acquired data usually exceeds the requirements for the respective study, and in many cases, only a fraction of the data is analyzed. Thus, it is convenient to share these data as an open access resource with the scientific community, since carefully staged morphogenetic information support research on wild-type or aberrant development, enable quantitative analyses and foster comparative approaches. Further, systematically acquired image data can be used to train machine learning algorithms and/or neuronal networks and are thus a major step towards high-volume AI-based research in developmental biology.
During the past decades, the number of insect model organisms increased considerably 22,23 . The Mediterranean fruit fly (medfly), Ceratitis capitata (Wiedemann), which belongs to the Diptera order, is a highly invasive agricultural pest with high economic relevance and became an important model organism for basic as well as pest management-associated research 24 . Standard and advanced techniques, such as germline transformation 25,26 , site-specific recombination 27 , targeted gene editing 28,29 , and cryopreservation 30 , are established. Further, the medfly genome sequence, available since 2016 31 , was recently improved 32 . D. melanogaster and C. capitata are both members of the Schizophora section but belong to different families. Phylogenetic analyses have shown that they diverged approximately 80-100 million years ago [33][34][35] . Regarding their embryonic development, both genera share apomorphic characteristics such as reduced extra-embryonic membranes, i.e., they form and degrade only one dorsally located membrane, the amnioserosa 36 . The closest comprehensively examined model organism, M. abdita 37 , which is a member of the Aschiza section that diverged from D. melanogaster and C. capitata approximately 150 million years ago 38 , develops two extra-embryonic membranes [39][40][41] . Thus, C. capitata bridges the phylogenetic gap between D. melanogaster and M. abdita and complements the existing pool of Dipteran model organisms for evolutionary developmental biology research. Studies on various biological questions have already been published, such as spatiotemporal gene expression patterns 42,43 , transcriptomics [44][45][46] , oogenesis 47 , larval morphology 48 and antennal lobe structure 49 but no comprehensive morphogenetic data and staging system for embryonic development has been available.
Using the cobweb holder approach 21,50 , an easy-to-use mounting method for insect embryos in LSFM, we recorded nine live imaging datasets of nine individual medfly embryos (484.5 h total recording time, 373 995 images, 256 Gb). Six datasets show the embryonic development in toto at 30 minutes intervals along four directions in three spatial dimensions, covering approximately 97% of the entire embryonic development period. Since the embryos were recorded at room temperature (23 ± 1 °C), development lasted for about 60 h 51 . The remaining three datasets focus on specific processes, such as germ cell formation and head involution. All imaged specimens hatched morphologically intact, and all but one developed into a healthy adult. Based on the acquired datasets, we established a morphogenesis-based two-level staging system that serves as a framework for future developmental studies in the medfly. We comprehensively quantify the temporal course of embryogenesis in both absolute terms as well as relative to total embryonic development and calculate the respective standard deviations for all time points. Taken together, our study provides the first long-term live imaging data of medfly embryonic development and thus contributes considerably to insect development biology, the comparative approach and pest management-associated research.

Methods
Transgenic medfly line and culture. This study used the TREhs43-hid Ala5 _F1m2 transgenic medfly line, which expresses nuclear-localized EGFP under control of the D. melanogaster polyubiquitin promoter 52 . Medfly cultures, homozygous for the transgene, were kept at 25 °C, 50% relative humidity in a 12-h bright/12-h dark cycle (DR-36VL, Percival Scientific, Perry, IA, United States) in transparent acrylic boxes (approximately 15 × 15 × 20 cm) in groups of around 80 individuals. The plastic boxes had sideward openings that were covered with fine-meshed gaze to allow embryo deposition. Medflies were reared on a 2:1 mixture of refined sugar (524973, REWE Markt GmbH, Köln, Germany) to inactive dry yeast (62-106, Flystuff, San Diego, CA, USA) that was moistened with autoclaved tap water. Additionally, autoclaved tap water was provided on a wet tissue.
Light sheet fluorescence microscopy. LSFM was implemented with a sample chamber-based digital scanned laser light sheet fluorescence microscope (DSLM, Fig. 1a) 21 , which generates a dynamic light sheet by rapidly scanning a Gaussian laser beam with a two-axes piezo-driven scanning mirror (M-116.DG, Physik Instrumente GmbH & Co KG, Karlsruhe, Germany). As the illumination light source, a 488 nm/60 mW diode laser (PhoxX 488-20, Omicron Laserprodukte GmbH, Rodgau-Dudenhofen, Germany) with a 488 nm cleanup filter (xX.F488, Omicron Laserprodukte GmbH, Rodgau-Dudenhofen, Germany) was used. Illumination was performed through a 2.5× NA 0.06 EC Epiplan-Neofluar objective (422320-9900-000, Carl Zeiss AG, Göttingen, Germany) and signal was collected either through a 10× NA 0.3 W N-Achroplan objective (420947-9900-000, Carl Zeiss AG, Göttingen, Germany) or through a 20× NA 0.5 W N-Achroplan objective (420957-9900-000, Carl Zeiss AG, Göttingen, Germany). In both setups, a 525/50 single-band bandpass filter (FF03-525/50-25, Semrock/ AHF Analysentechnik AG, Tübingen, Germany) and a high-resolution CCD camera (Clara, Andor, Belfast, United Kingdom) were used for detection. Conventionally 21 , the illumination axis is defined as x, the rotation axis was defined as y, and the detection axis is defined as z. For convenience, y is parallel to the Earth's gravitational axis. Axes are mentioned in the manuscript or indicated on figures whenever appropriate. The DSLM was further equipped with a 760-nm diode to acquire transmission light images. Three micro-translation stages (M-111.2DG, Physik Instrumente GmbH & Co KG, Karlsruhe, Germany) and a precision rotation stage (M-116. DG, Physik Instrumente GmbH & Co KG, Karlsruhe, Germany) were used for sample translation along x, y and z and rotation around y, respectively.
Embryo collection and preparation. For embryo collection, medfly cultures were removed from the incubator. Old embryos were removed from the fine-meshed gaze and discarded. The cultures were given 10 min for embryo deposition at room temperature (23 ± 1 °C). All embryos (typically 5 to 20 per culture) laid in this time window were transferred to a 100 µm cell strainer (#352360, BD Biosciences, Heidelberg, Germany) with a paint brush. The embryos were moistened in PBS pH 7.4 (10010-023, Gibco Life Technologies GmbH, Darmstadt, Germany), dechorionated for 90 seconds in a 1:9 mixture of ~10% (vol/vol) sodium hypochlorite solution (425044-250 ML, Sigma Aldrich, Taufkirchen, Germany) and PBS and then washed twice in PBS for 60 seconds.
Mounting using the cobweb holder. To keep the medfly embryos mechanically stable end ensure precise movement within the sample chamber during repeated movement and recording sequences (translation along z while recording, rotation around y followed by translation along x, y and z to reposition the embryo before acquisition of the next z stack) we used the cobweb holder mounting method 21,50 . The cobweb holder used in this www.nature.com/scientificdata www.nature.com/scientificdata/ study consists of a stainless-steel cylinder to which a 0.2 mm stainless steel plate with a 2 mm × 4 mm slotted hole is attached. For specimen mounting, a drop of agarose (5-7 µl) was pipetted onto the center of the slotted hole and excessive liquid was removed to create an agarose film with a thickness of around 50 µm. Using a small paint brush, dechorionated medfly embryos were placed onto the agarose film with their elongated anterior-posterior axis aligned with the long axis of the slotted hole (Fig. 1b). The embryos were only partially embedded in agarose, thus keeping the distance that the laser beam and the emitted fluorescence must pass through agarose at a minimum while facilitating the necessary gas exchange with the imaging buffer.
Upon insertion into the sample chamber, the cobweb holder was aligned along x (Fig. 1c, first column), which we define as the preliminary direction (orientation −45°) to simplify translation of the embryo into the center of the field of view. For imaging, the cobweb holder was rotated by 45°, which we define as direction 1 (orientation 0°). Along this direction, the steel plate blocked neither the illumination, detection or transmission light (Fig. 1c, second and third column), which permitted to record the embryos in both, the transmission light (Fig. 1d) and fluorescence (Fig. 1e) channels. The cobweb holder was rotated around y in steps of 90° to record the specimen along four directions ( Supplementary Fig. 1). The extents of the cobweb holder allowed the acquisition of z stacks with a range of up to 800 µm. Post-acquisition corrections such as drift compensation were not necessary.
Mounting using the agarose hemisphere method. In LSFM, the lateral resolution (along x and y) exceeds the axial resolution (along z) by a factor of approximately three to four 53 . Optical sectioning of insect embryos using the cobweb holder is restricted to planes along the ventro-dorsal and lateral axes. To also achieve optical sectioning along the antero-posterior axis that benefit from the high lateral resolution, we adapted the agarose hemisphere mounting method, which was initially established for T. castaneum embryos 17,18 . However, unlike in the initial approach, medfly embryos were attached with their dorsal side to the pole of the agarose hemisphere ( Supplementary Fig. 2). www.nature.com/scientificdata www.nature.com/scientificdata/ Long-term live imaging and embryo retrieval. In total, nine long-term live imaging datasets (DS) were recorded (Supplementary Table 1). In six out of nine datasets (DS0001-DS0006), embryos were captured in toto under equal conditions for about 60 h with a temporal interval of 30 min, covering approximately 97% of embryonic development (Supplementary Video 1). The remaining datasets cover specific embryonic processes, e.g., pole cell formation (DS0007) and head involution (DS0008 and DS0009). For datasets DS0001-DS0008, embryos were imaged along four directions along the orientations 0°, 90°, 180° and 270°. Embryos from DS0001 and DS0003 were imaged along both ventrolateral-dorsolateral axes, whereas embryos from datasets DS0002, DS0004, and DS0005 were imaged along the ventro-dorsal and lateral axes. For dataset DS0009, the embryo was imaged along two directions in the orientations 0° and 90°. For each acquired plane, the embryo was illuminated with a 488 nm laser beam with a power of 135 µW during a 50 ms exposure time window. Each z stack consisted of 100 or 115 planes with an axial pitch of 2.58 µm. Thus, the datasets were acquired with a x:y:z resolution ratio of 1:1:4 and 1:1:8 for the 10× objective and the 20× objective, respectively. After imaging was completed, embryos were retrieved from the microscope. All embryos hatched morphologically intact. However, the embryo from DS0006 did not develop into a healthy adult, hence the respective image data was excluded from further analysis. image processing. The z stacks and respective z maximum projections were rotated around z and cropped to a final size of 500 × 1390 pixel to align the anterior-posterior axis of the embryos with the y axis of the image and place the embryo in the center of the z stacks and z maximum projections. For each direction, all z maximum projections were combined, and all time points were subsequently concatenated to t stacks, which were subjected to a mean transformation as described previously 18 and adjusted in brightness and contrast. Two-level staging system implementation. For proper temporal quantification of embryonic development and to provide a standard for future studies of the medfly, a comprehensive staging system is necessary. The proposed staging system consists of two levels, i.e., embryogenetic events and stages, relies on five in toto datasets (DS0001-DS0005) that were acquired under equal conditions, and considers exclusively morphogenetic criteria. For the upper level, six consecutive embryogenetic events are specified and denoted with color-coded Roman numerals: (I) blastoderm formation is represented in blue, (II) early gastrulation in cyan, (III) germband elongation in green, (IV) germband retraction in yellow, (V) dorsal closure in orange and (VI) muscular movement in red. A comparison of event-characteristic structures between DS0001-DS0005 is given in Fig. 2. A comprehensive overview of embryonic event onset time points is shown Supplementary Fig. 3 and summarized in Supplementary Table 2. Since imaging started approximately 2 h after embryo collection, the first time point (TP0001) was set to 02:00 h of absolute development time.
For the lower level, due to the high similarity, the 17 stages from the D. melanogaster staging system 54 were adapted analog to what has been done in a similar study on M. abdita 38 and denoted with Arabic numerals. In consequence, each embryogenetic event consists of one or multiple stages. Major event and stage identifiers are given in Table 1. A comprehensive overview of stage onset time points for DS0001-DS0005 along one orientation is shown in Supplementary Fig. 4 and summarized in Supplementary Table 3.

Dataset alignment.
Of the five datasets (DS0001-DS0005) in which the embryos were imaged in toto and under identical conditions, DS0001 was the dataset with the median development time, so DS0002-DS0005 were adjusted and aligned stage-by-stage onto DS0001. In cases where stages in DS0002-DS0005 lasted shorter or longer, the first and last time points were aligned to DS0001, resulting in stretching or compression of intermediate periods, respectively. As this leads to non-matching time points, the respective matching time points were interpolated. The interpolation values were used for calculations of the absolute and relative standard deviations (Supplementary Table 4). An overview of embryogenetic events and stages for DS0001-DS0005 is given in Table 1. A comprehensive overview of stage onset time points for DS0001-DS0005 is shown in Supplementary  Table 4.

Data records
The nine long-term datasets of medfly embryonic development are provided as ZIP-compressed TIFF files and explore six degrees of freedom: the first (x) and second (y) spatial dimensions are obtained simultaneously during one camera exposure period. The third spatial dimension (z) is represented by the optical sections that are recorded while the embryo is moved through the light sheet. The z stacks are saved as individual files using the TIFF-intrinsic container function (indicated as PL(ZS) within the file name). Together, the three spatial dimensions define the volume of view. The further degrees of freedom are the fluorescence channel (one), the direction (typically four) and the time point (up to 126), which are saved individually (indicated as CH, DR or TP within the subfolder or file name, respectively). For convenience, z maximum projections are also provided. These are simplifications of the datasets where one spatial dimension (z) is removed. The projections are provided in two versions, as raw z maximum projections (indicated as PL(ZM) within the subfolder or file name) or as z maximum projections with image adjustment (indicated as PL(ZA) within the subfolder or file name). Respective t stacks are saved as individual files using the TIFF-intrinsic container function (indicated as TP(TS) within the file name), the adjusted versions are further provided as direction montages along x and y (indicated as DR(AX) or (AY) within the file name). For each dataset, all files were compiled into one ZIP folder and deposited as a single record at Zenodo [55][56][57][58][59][60][61][62][63] . In addition to the ZIP folder, each record also contains a downscaled AVI movie of the direction montage along x (12-60 Mb) for fast inspection as well as a machine-readable XLSX metadata file. Metadata optimized for human readability and DOI-based access information are provided in Supplementary

Technical Validation
Microscope calibration. Prior to each imaging assay, the DSLM went through a two-step calibration routine as described previously 18 . Thus, typical problems that might occur in LSFM are avoided (e.g., offset or tilt between the light sheet and the focal plane of the detection objective). Laser power (Supplementary Table 1) was measured with an optical power and wavelength meter (OMM-6810B and OMH-6703B, Newport, Irvine, CA, United States) at the exact location where the embryos were positioned during the imaging process.
Quality control. Imaged embryos were raised to adults to assure that the imaging procedure, e.g., the irradiance by the laser, does not induce any aberrations. The embryos were imaged until hatching, i.e., until the first time point in which only the empty eggshell was captured. When this happened, imaging was stopped and the larva, which was floating in the imaging buffer of the sample chamber, was retrieved with a plastic Pasteur pipet. The larva was placed on larval medium 64 and incubated for several weeks under the same conditions as described for the adult culture in the Methods section until pupation and eclosure occurred. The embryo from dataset DS0006 did not develop into a healthy adult and was therefore excluded from further analysis. The quality of the image data was validated by manual examination of all z maximum projections: for each dataset (nine in total), for each time point (up to 126 per dataset), and for each direction (two to four per dataset and time point).

Stage onset identifiers.
Iconic aspects for almost all stage onsets defined for Drosophila melanogaster 54 could also be identified in Ceratitis capitata. The only deviation concerned the transition from stage I-5 to stage II-6 for technical reasons. For the fruit fly, one of the central morphogenetic processes is the completion of cellularization, which cannot be recognized in the presented datasets as the fluorescent medfly line provides only  Continued www.nature.com/scientificdata www.nature.com/scientificdata/ signals in the nuclei. Therefore, the identifier for the onset of stage II-6 was defined as the first noticeable movement of the blastoderm nuclei after the 13 th synchronous nuclear division.
Temporal resolution. The datasets used to establish the staging system, i.e., DS0001-DS0005, cover, on average, 120 time points. In conjunction with the extrapolation for stage I-1, which equals four time points for each dataset (Supplementary Table 1), this corresponds to a relative temporal resolution of approximately 0.8% of embryonic development. Consequently, stage onset time points may deviate by up to 0.4%, which may be particularly noticeable if a developmental period features extensive morphogenetic changes, for example stage III-8. In the images that were specified as the stage onset time points, the embryos from DS0001 and DS0003 appear to be slightly ahead in development compared the embryos from the remaining three datasets considering the position of the posterior germband tip (Supplementary Table 3d

Usage Notes
To work with the data, ImageJ 65,66 or its derivate FIJI 67 are recommended as the primary image processing program, which are open-source software frameworks for the analysis of multi-dimensional biological imaging data. The order of the degrees of freedom (see Data Records section) is compatible with all image processing approaches capable of handling three-dimensional, multi-channel, multi-view dynamic data.

code availability
The custom Mathematica (Wolfram Research, version 13.0) script used to subject the t stacks to a mean transformation has been described previously 18 (this function does not require any parameters).   Table 1. Alignment and temporal breakdown of datasets DS0001-DS0005 into embryogenetic events (Roman numbers) and stages (Arabic numbers). Onset time points of embryogenetic events are color-coded. Imaging typically begins with stage I-2, the values for I-1 are extrapolated (see Methods section). TP, imaging time point; time, absolute time passed from the onset of I-1 until the indicated TP; rel dev, relative progress of embryonic development from the onset of I-1 until the indicated TP.